Estimating the Quality of Data in Relational Databases
نویسندگان
چکیده
With more and more electronic information sources becoming widely available, the issue of the quality of these, often-competing, sources has become germane. We propose a standard for rating information sources with respect to their quality. An important consideration is that the quality of information sources often varies considerably when specific areas within these sources are considered. This implies that the assignment of a single rating of quality to an information source is usually unsatisfactory. Of course, to the user of an information source the overall quality of the source may not be as important as the quality of the specific information that this user is extracting from the source. Therefore, methods must be developed that will derive reliable estimates of the quality of the information provided to users, from the quality specifications that have been assigned to the sources. Our work here bears on all these concerns. We describe an approach that uses dual quality measures that gauge the distance of the information in a database from the truth. We then propose to combine manual verification with statistical methods to arrive at useful estimates of the quality of databases. We consider the variance in quality by isolating areas of databases that are homogeneous with respect to quality, and then estimating the quality of each separate area. These composite estimates may be regarded as quality specifications that will be affixed to each database. Finally, we show how to derive quality estimates for individual queries from such quality specifications. This work was supported in part by DARPA grants N0014-92-J-4038 and N0060-96-D-3202.
منابع مشابه
Estimating the Quality of Databases
With more and more electronic information sources becoming widely available, the issue of the quality of these often-competing sources has become germane. We propose a standard for specifying the quality of databases, which is based on the dual concepts of data soundness and data completeness. The relational model of data is extended by associating a quality specification with each relation ins...
متن کاملInvestigating the Impact of Information Quality on Relationship Marketing with Mediating Role of Salespeople’ Relational Competency: Survey about Iranian ISP
Despite the vital role of information in relational-oriented firms, there are limited studies on the impact of information quality on relationship marketing. To address this gap, this study develops a conceptual model to examine the impact of information quality on the successful implementation of relationship marketing by assessing the mediating role of salespeople's relational competency. The...
متن کاملAdaptive-clustering Based Method to Estimate Null Values in Relational Databases
Data preprocessing is an essential step of knowledge discovery. Data preprocessing comprises data cleaning, data integration, data transformation, data reduction and data discretization. Estimating null values is a task of data cleaning. Null values in a database are significant sources of poor data quality. Therefore, the appropriate handling of null values is an important task of data preproc...
متن کاملRelational Databases Query Optimization using Hybrid Evolutionary Algorithm
Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...
متن کاملOptimizing turning operation of St37 steel using grey relational analysis
Nowadays, in order to reach minimum production cost in machining operations, various optimization methods have been proposed. Since turning operation has different parameters affecting the workpiece quality, it was selected as a complicated manufacturing method in this paper. To reach sufficient quality, all influencing parameters such as cutting speed, federate, depth of cut and tool rake angl...
متن کاملApply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996